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© !n a data processing system which employs a cache memory feature, a method and exemplary special 
purpose apparatus for practicing the method are disclosed to lower the cache miss ratio for called operands. 
Recent cache misses are stored In a first in, first out miss stack, and the stored addresses are searched for 
displacement patterns thereamong. Any detected pattern is then employed to predict a succeeding cache miss 
by prefetching from main memory the signal identified by the predictive address. The apparatus for performing 
this task is preierabiy hard wired tor speed purposes and includes subtraction circuits for evaluating variously 
displaced addresses in the miss stack and comparator circuits for determining if the outputs from at least two 
soblraction circuits are the same indicating a pattern yietding information which can be combined with an 
address in the stack to deveiop a predictive address. 
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CACHE MISS PREDICTION METHOD AND APPARATUS 



Field of the Inventioi 



!? T I l ° 3,1 ° f d3ta P^^^ systems which include a cache memory feature and, 
more partautariy. lo a method and apparatus for predicting memory cache misses for operand calls and 
U :' mg inforrn2l,on 10 lransfer daia { ™ a ™" memory to cache memory to thereby lower !he cache 



Background of trie invention 



The technique of employing a high speed cache memory intermediate a processor and a main memory 
to hold a dynamic subset of the information in the mam memory in order to speed up system operation is 
7^1T ^ f B ? fiy ' lh9 ^ boids 3 ^able collection of main memory information 

fragment selected and updated such that there is a good chance mat the fragments wifl include 
* instructions and/or data required by the processor in upcoming operates. If there is a cache "hi?" on a 
flT °fff °\ i " formation is avaiiable 10 * e Processor much fester than if main memory had to be 



so The key to obtaining a low cache miss ratio is obviously one of carefufiy selecting the information to be 
T.^l 0 ^ l r ° m ma f n ' m emory at any given instant There are severai techniques for selecting 
Clocks of instructions for transitory residence in the cache, and the more or less linear use of instructions in 
to r T«l?r» r?ndera , ,h8Sa t6Chniqyes ^^"y effective. However, the selection of operand information 
35 LLTSJl? mem ° ry 31 8 9iV6n ' nStant baS faeen mych less effecSve a « d ^ been generally 
35 ° ^ s ' e ™SJ™ or »e contiguous blocks Including a cache miss address. This approach only 

siignfjy lowers the cache miss ratio and is also an ineffective use of cache capacity 

^nnn^J-T t^J? ** 9 ^ ^ 6BS ' M t0 «»■« for 

selecting operand information for transitory storage in a cache memory in such a manner as to significantly 
tower the cache miss ratio, and it is to that end thai the present invention is directed. "8™**"* 

Object s of the invention 

processing Sm " ^ ^ ° f ** * ™ ' mpt °^ cache memw y ~ m a data 

It Is another object of this invention to provide a cache memory particularly characterized by exhibiting 

an improved cache miss ratio in operation. / wiaorang 

It is a more specific object of this invention to provide a cache memory incorporating circuitry for 

effectively predicting cache misses. a y 

40 In another aspect, it is an object of this invention to provide a cache memory selection process in which 

the cache rruss ratio for operands is significantly lowered. 



Summary of the Invention 



Briefly these and other objects of the invention are achieved by special purpose apparatus in the cache 
emIL" **** miSS I S » s Any detected pattern is then 

Z prSd cache mU. ' ° * * ^ ™ memo " ihe bto <* «^aining 



Description of the g^wng 



The subject matter of the invention is particularly pointed out and distinctly claimed in ihe concluding 
portion of the specification. The invention, however, both as to organization and method of operation, may 
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besi be understood by reference (o the following description taken in conjunction with the subjoined claims 

and the accompanying drawing of which: 

FIG. 1 is a generalized biock diagram of a typical data processing system employing a cache 

memory and therefore constituting an exemplary environment for practicing the invention: 
.? FIG. 2 is a flow diagram illustrating, in simplified form, the sequence of operations by which the 

invention is practiced; 

FIG. 3 is a logic diagram of a simple exemplary embodiment of the invention; and 
FIG. 4 is a iogic diagram of s more powerful exemplar/ embodiment of the invention. 

70 

Detailed Description of the Invention 

Referring now to FIG. 1, there is shown a high level block diagram for a data processing system 
incorporating a cache memory feature. Those skilled In the art will appreciate that this biock diagram is only 

?5 exemplary and that many variations on if are employed in practice, its function is merely to provide a 
context for discussing the subject invention. Thus, the illustrative data processing system includes a main 
memory unit 13 which stores the data signal groups (i.e., information words, including instructions arid 
operands) required by a central processing unit 14 to execute the desired procedures. Signai groups with 
an enhanced probability for requirement by the central processing unit 14 in the near term are transferred 

w from the main memory unit 13 for a user unit 15} through a system interface unit tt to a cache memory 
unit 12. {Those skilled in the art witt understand that, in some data processing system architectures, the 
signal groups are transferred over a syslem bus. thereby requiring an interface unit for each component 
interacting with the system bus.) The signal groups are stored In the cache memory unit 12 until requested 
by the centra! processing unit 14. To retrieve ths correct signai group, address translation apparatus 16 is 

ss typically incorporated to convert a virtual address (used by the central processing unit 14 to Identify the 
signal group to be fetched) to the real address used for that signal group by the remainder of the data 
processing system to identify the signal group. 

The information stored transiently in the cache memory unit 14 may Include both instructions and 
operands stored in separate sections or stored homogeneously. Preferably, in the practice of the present 

30 invention, instructions and operands are stored in separate (at least in the sense that they do not have 
commingled addresses) memory sections in the cache memory unit 14 inasmuch as it is intended to invoke 
the operation of the present invention as to operand information only. 

Trie present Invention is based on recognizing and taking advantage of sensed patterns in cache 
misses resulting from operand calls. In an extremely elementary example, consider a sensed pattern in 

35 which three consecutive misses ABC are, in fact, successive operand addresses with D being the next 
successive address. This might take place, merely by way of example, in a data manipulation process 
catting for successively accessing successive rows in a single column of data. If this pattern is sensed, the 
likelihood that signai group D will also be accessed, and soon, is enhanced such that its prefetching into the 
cache memory unit 14 is in order. 

40 The fundamental' principles of the invention are set forth in the operational flow chart of FIG. 2. When a 
processor (or other system unit) asks for an operand, a determination is made as !o whether or not the 
operand is currently resident in the cache, if so, there is a cache hit (i.e., no cache miss), the operand is 
sent to the requesting system unit and the next operand request is awaited. However, if there is art cache 
miss, the request is, in effect, redirected to the (much slower) main memory. 

is These skilled in the art will understand that the description to this point of FIG. 2 describes cache 
memory operation generally, in the present invention, however, the address of the cache miss is 
meaningful. It is therefore placed at the top of a miss stack to be described injurther detail below. The miss 
stack {wf-,ich contains a history of the addresses of recent cache misses in consecutive order) is then 
examined to determine if a first of several patterns is present. This first pattern might be, merely by way of 

so exampie, contiguous addresses for the recent cache misses, tf the first pattern is not sensed, additional 
patterns are fried. Merely by way of example again, a second pattern might be recent cache misses calling 
for successive addresses situated two locations apart. So long as there is no pattern match, the process 
continues through the pattern repertoire. If there is no match when all patterns in the repertoire have been 
examined, the next cache miss is awaited to institute the process anew. 

S3 However, if a pattern in the repertoire is detected, a predictive address is calculated from the 
information in the miss stack and from the sensed pattern. This predictive address ts then employed to 
prefetch from main memory into cache the signal group identified by the predictive address, in >ne 
elementary example previously given, if a pattern is sensed in which consecutive operand cache miss 
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operand addresses A3C are consecutive and contiguous, the value of the predictive address. D, will bo C 
+ f. 

In order to optimize the statistics! integrity of the miss stack, the predictive address itself may be 
placed at the top o? the stack since it would {highly probably) itseif have been the subject of a cache miss if 
s it had not been prefetched in accordance wi!h the invention. 

Since spoed of operation is essentia!, the invention may advantageously be embodied in a hard wired 
form (e.g., in a gate array) although firmware control is contemplated. Consider first a relatively simple 
hardwired implementation shown in FIG. 3. A miss stack 20 holds the sixleen most recent cache miss 
addresses, the oldest being identified as address P with entry onto the slack being made at the top. Four 

ir> fouMnput electronic switches 21, 22, 23. 24 are driven in concert by a shift pattern signal via line 25 such 
that: in a first state, addresses A. 8, C, D 3ppear at to respective outputs of the switches; in a second state, 
addresses B, D. F, H appear at the outputs; in a third state, addresses C, F. {. L appear at the outputs: and 
m a fourth stele, addresses D, H, L, P appear at the outputs. Subtraction circuits 26, 27, 28 are connected 
to receive as inputs the respective outputs of the electronic switches 21. 22. 23, 24 such that: the output 

js from the subtraction circuit 26 is the output o! the switch 21 minus the output of the switch 22; Die output 
from the subtraction circuit 27 is the output of the switch 22 mines the output of the switch 23; and the 
output from the subtraction circuit 26 is the oulput of the switch 23 minus the output oi the switch 24. 

The output irom the subtraction circuit 26 is applied lo one input of an adder circuit 31 which has its 
other input driven by the output of the electronic switch 21. In addition, the output from the subtraction 

so circuit 26 is aiso applied to one input of a comparator circuit 29. The output from the subtraction circuit 27 
is applied to the other input of the comparator circuit 29 and also to one input of another comparator circuit 
30 which has its other input driven by the output of the subtraction circuit 28. The outputs from the 
comparator circuits 29. 30 are applied. respecUveJy, to the two inputs of an AND-gate 32 which selectively 
issues a prefetch enable signal. 

?5 Consider now the operation of the circuit shown in FIG. 3. As previously noted, miss stack 20 holds the 
last sixteen cache miss addresses, address A being the most recent. When the request for the signal group 
identified by address A results in a cache miss, circuit operation is instituted to search for a pattern among 
ib8 addresses resident in the miss stack. The electronic switches 21, 22, 23, 24 are at their first state such 
that address A is passed through to the output of switch 21, address 6 appears at the output of switch 22, 

30 address C appears at the output of switch 23 and address D appears at the output of switch 24. If the 
differences between A and B. B and C. and C and D are not aff equal, not atf the outputs from the 
subtraction circuits 26, 27, 28 will be equal such that one or both the comparator circuits 29, 30 will issue a 
no compare; and AND-gate 32 wtii not be enabled, thus indicating a "no pattern match found* condition. 
The switches are then advanced to their second state in which addresses B, D, F. H appear at their 

3S respective outputs. Assume now that (B - 0) » (D - F>' - (F - H); J.e., a sequential pattern has been sensed 
in the address dispfacemenis. Consequently, both the comparators 29, 30 will issue compare signals to fully 
enable the AND-gate 32 and produce a prefetch enable signal. Simultaneously, the output from the adder 
circuit 31 will be the predictive address (8 + (B -D)). fl wit! be seen that this predictive address extends the 
sensed pattern and thus increases the probability that the prefetched signal group will be requested by the 

•ro processor, thereby towering the cache miss ratio. 

ff a pattern had not have been sensed in the address combination SDFH, the electronic switches would 
have been advanced to their next state to examine the address combination CRL and then on to the 
address combination OHLP if necessary, li no pattern was sensed, the circuit would await the next cache 
miss which will place a new entry at the top of the miss stack and push address P out the bottom of me 

-;s stack before the pattern match search is again instituted. 

Consider now the somewhat more complex and powerful embodiment of the invention iilustrated in FIG. 
4. electronic switches 41, 42. £3, 44 receive at their respective inputs recent cache miss addresses as 
stored in me miss stack 40 in the exempfary arrangement shown. If wiil be noted that each of the electronic 
switches 41, 42, 43, 44 has eight inputs which can be sequentially selectively transferred to the single 

so outputs under the influence of the shift pattern signal. It will also be noted that the miss stack 40 stores, in 
addition to the sixteen latest cache miss addresses A - P, three future entries WXY. Subtraction circuits 45. 
46, 47 perform the same office as the corresponding subtraction circuits 26. 27, 28 of the FIG. 3 
embodiment previously described. Similarly, adder circuit 48 corresponds io the adder circuit 31 previously 
described. 

35 Comparator circuit 49 receives the respective outputs of the sublraction circuits 45. 46. and its output is 
applied io one input oi an AND-gate 38 which selectively issues the prefetch enable signal. Comparator 
circuit 50 receives trs sespee-ive outputs ol the subtraction circuits 46, 47. but. unlike its counterpart 
comparator 30 of the FIG. 3 embodiment, its output is applied to one input of an QR-gate 39 which has. its 
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other input driver) by a reduce lookahead signal. The output of OR-gata 39 is coupled to Ihe other input of 
AND-gate 38. With this arrangement, activation of the reduce lookahead signal enables OR-Gate 39 and 
partialiy enables AND-gate 38. The effect of applying the reduce iookahead signal is to compare only the 
outputs of the subtraction circuits 45, 45 in the comparator circuit 49 such that a compare fully enables itie 

5 AND-gate 38 to issue the prefetch enable signal This mode of operation may be useful, for example, when 
the patterns seem to be changing every few cache misses, and it favors (tie most recent examples. 

With the arrangement of FIG. 4. it is advantageous to try all the patterns within pattern groups (as 
represented by the "YES" response to the ">1 PATTERN GROUP?" query in the flow diagram of FIG. 2} 
even If there is a pattern match detected Intermediate the process. This follows from the fact that more than 

jo one of the future entries WXY to the miss stack may be developed during a single pass through the pattern 
repertoire or even a subset of the pattern repertoire. With the specific implementation of RG. 4 (which is 
only exemplary of many possible useful configurations), the following results ana obtainable: 
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The goal slates are searched in groups by switch stats; i.e.: Group 1 includes switch states 0, 1, 2 and 
coutd result to filling future entries WXY; Group Z includes states 3. 4 and could result in filling entries WX; 
Group 3 includes states 5, 6 and could also result in filling entries WX; and Group 4 Includes state 7 and 
could result in filling entry W. When a goal state Is reached that has been predicted, the search is halted for 
the current cache miss; l.e., It would not be desirable to replace an already developed predictive address W 
with a different predictive address W. 

Those skilled in the art will understand that the logic circuitry of FtGs. 3 and 4 Is somewhat simplified 
since multiple binary digit information is presented as ff It were single binary dlgft information. Thus, In 
practice, arrays of electronic switches, gates, etc. will actually be employed to handle the added dimension 
as may be necessary and entirely conventionally. Further, timing signals and logic for Incorporating the 
inventive structure Into a given data processing system emHronmenS will be those appropriate for that 
environment and will be the subject of straightforward logic design. 

Thus, white the principles of the invention have now been made clear in an illustrative embodiment, 
there will be immediately obvious to those skilled in the art many modifications of structure, arrangements, 
proportions, the elements, materials, and components, used in the practice of the invention which are 
particularly adapted for specific environments and operating requirements without departing from those 
principles. 



Claims 

1. In a data processing system incorporating a cache memory, a method for predicting cache miss 
addresses comprising the steps of: 

A) establishing a miss stack for storing a plurality of cache miss addresses; 
8) waiting for a cache miss; 

C} when a cache miss occurs, placing the address of the called information onto the top of the miss 

stack; 

D) examining the miss stack for an address pattern among the resident cache miss addresses; 

E) if a pattern is not sensed, returning to step B); and 

F) if a pattern is sensed: 

1) using the sensed pattern and at least one of the addresses in the miss stack to calculate a 
predictive address; 
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2) prefetching into cache memory the signal group identified by the predictive address- and 

3) returning to step B). 

2 The method of Claim 1 in which, during step D). a repertoire of predetermined address patterns are 
searchable, and the examination continues from pattern to pattern until a first pattern match is sensed. 

3. in a data processing system incorporating a cache memory, a method for predicting cache miss 
addresses comprising the steps of: 

A) establishing a miss stack for storing a plurality of cache miss addresses; 

B) waiting for a cache miss; 

-tack- C) 3 080118 miSS ° CCUrS ' p!aanQ lhe add, ' ess of «w called information onto the top of the miss 

D) examining the cache miss addresses resident in the miss stack for a match with a selected 
address pattern in a current group of a plurality of groups of patterns; 

E) >f the selected pattern is not sensed, determining if ail the patterns in the current group have been 
examined; 

F) if all the patterns in the current group have not been examined, selecting another pattern in tno 
current group and returning to step 0); 

G) if all the patterns in all the groups in the pattern repertoire have been searched, returning to step 

H) if alt the patterns in the current group have been examined, assigning another group as the current 
group, selecting a pattern from the new current group and returning to step D); and 

1} it Ihe selected pattern is sensed: 

1) using the sensed partem and at ieast one of the addresses in the miss stack to calculate a 
predictive address; 

2) prefetching into cache memory the signal group identified by the predictive address; and 
3} assigning another group as the current group and returning lo step 0). 

4. The method of Claim 3 in which, intermediate sobsteps 1)1) and f}3), there is performed substep 1)2}- 
a) m which the predictive address is placed onto the miss stack. 

5. Apparatus for developing a predictive address for prefetching information into a cache memory 
comprising: 

A) a first in, first out stack for storing a plurality of addresses representing cache misses; 

B) a plurality of electronic switch means each having a plurality of address inputs and a single 
address output 

C) means coupling said addresses stored in said stack individually to said electronic switch means 
inputs iR predetermined orders; 

D) means for switching said electronic switch means to transfer said addresses applied to said 
electronic sw.ich means inputs to said electronic switch outputs to establish at said electronic switch 
outputs predetermined combinations of said addresses; 

E) at least two subtraction circuit means each coupled to receive a pair of different addresses from 
said electronic switch means outputs and to issue a vaiue representing the displacement therebetween; 

F) ai least one comparator circuit means coupled to receive a pair of outputs from a corresponding 
parr of said subtraction circuit means and responsive thereto lor issuing an prefetch enable logic signal if 
there is a compare condition; and 

G) predictive address development means adapted to combine one of said addresses appearinq at 
one of said electronic switch outputs and displacement information appearing a! one of said subtraction 
circuit means to obtain a predictive address: 

whereby, the coordinated presence of said predictive address and said prefetch enable logic signal causes 
a signal group identified by said predictive address to be prefetched into said cache memory 

6. The apparatus of Claim 5 which includes at least three or said subtraction circuit means and at least * 
two of said comparator circuil means and which further comprises: 

A) AND-gate means having separate inputs respectively receiving outputs coupled from sa.d at ieast two 
comparator c.rcuK means, said AND-gate selectively issuing said prefetch enable logic signai only when 
fully enabled. 

7. The apparatus of Claim 6 which further incfudes: 

A} OR-gate means driving at least one input to said AND-gate means, said OR-gate means having inputs 



! . outputs coupled from at least one of said comparator circuits; and 
2. a selectively applied reduce iookahead logic signal; 

whereby, application of said reduce Iookahead signal to said OR-gate means partially enables said AND- 
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gate means driven thereby and thus eliminates said at least one of said comparator circuits from 
consideration in the issuance of said prefetch, enable logic signal. 
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